Menstrual cycle features in mothers and daughters in the Avon Longitudinal Study of Parents and Children (ALSPAC)

Problematic menstrual cycle features, including irregular periods, severe pain, heavy bleeding, absence of periods, frequent or infrequent cycles, and premenstrual symptoms, are experienced by high proportions of females and can have substantial impacts on their health and well-being. However, research aimed at identifying causes and risk factors associated with such menstrual cycle features is sparse and limited. This data note describes prospective, longitudinal data collected in a UK birth cohort, the Avon Longitudinal Study of Parents and Children (ALSPAC), on menstrual cycle features, which can be utilised to address the research gaps in this area. Data were collected across 21 timepoints (between the average age of 28.6 and 57.7 years) in mothers (G0) and 20 timepoints (between the average age of 8 and 24 years) in index daughters (G1) between 1991 and 2020. This data note details all available variables, proposes methods to derive comparable variables across data collection timepoints, and discusses important limitations specific to each menstrual cycle feature. Also, the data note identifies broader issues for researchers to consider when utilising the menstrual cycle feature data, such as hormonal contraception, pregnancy, breastfeeding, and menopause, as well as missing data and misclassification.


Introduction
Problematic menstrual cycle features have been reported to affect high proportions of adolescent girls, women, and people who menstruate.Previous research had indicated that 11-46% of females experience irregular periods [1][2][3][4][5][6] , 42-95% dysmenorrhea (menstrual pain) [1][2][3][4]7 , 3-13% amenorrhea (absence of periods) 5,8,9 , 37-78% premenstrual syndrome (PMS) 1,4 , 1-19% frequent or infrequent cycles (less than 24 or more than 38 days) 1,8,10 , and 4-58% heavy menstrual bleeding (HMB) 4,8,11 .Whilst most of these estimates come from research conducted in high income countries (HICs), the research conducted in low and middle income countries (LMICs) indicates similar prevalence ranges.However, the estimates do vary considerably, possible due to factors such as age, the setting and population under study, measurement and definition of the feature, and contextual attitudes towards menstruation.Such features may be related to other conditions, such as endometriosis, polycystic ovary syndrome (PCOS), or fibroids; however, many are idiopathic 10,12,13 . Theserelatively common problematic menstrual cycle features have been associated with several adverse physical and mental health outcomes, including infertility, anaemia, pain sensitivity, sleep disturbances, anxiety, and depression 7,8,14 .Research has also demonstrated a substantial impact on social wellbeing; negatively impacting school, work, relationships, exercise, and health-related quality of life 8,15 . Depite this, there has been relatively limited research to understand the causes and risk factors associated with problematic menstrual cycle features.Previous research has highlighted ethnicity 16 , family history 1,2,7,12 , smoking 6,12 , high or low BMI 2,6,7 , earlier age at menarche 2,6,7,12 , presence of other menstrual problems 1,7,12 , and low socioeconomic position 1,2,6,14,16,17 (SEP) as possible risk factors.However, much of the evidence comes from cross-sectional studies and research in clinical populations or university students.This means that many findings may be unrepresentative of broader populations, reports of menstrual cycle features may be hindered by recall bias, and there is a limited understanding of whether associations reflect causal effects or otherwise (e.g., confounding) and the direction of any causal effect. Furthe research is therefore necessary to understand the burden, life course risk factors, causes, and consequences of problematic menstrual cycle features, and provide insights into the causes and consequences of within and between woman variation in menstrual features.This would require greater use of prospective, longitudinal data on the menstrual cycle as well as factors that may be confounders, mediators, or moderators to enhance the accuracy of reporting and enable exploration of causal relationships.
Whilst there are limited resources that provide such rich menstrual cycle data, the Avon Longitudinal Study of Parents and Children (ALSPAC) is a longitudinal birth cohort with repeated measures data on features of the menstrual cycle across two generations of females, as well as data on a wide range of physical, psychological, social, and genetic factors.This data note aims to promote the use of the detailed menstrual cycle data available in ALSPAC to address these research needs.We describe the menstrual cycle data collected in the mothers (G0) and index daughters (G1) enrolled in ALSPAC, including repeated measures on menstrual cycle features throughout puberty and into early adulthood from questionnaire and clinic assessments.

ALSPAC
ALSPAC is a longitudinal birth cohort where pregnant women resident in Avon, UK with expected dates of delivery between 1 st April 1991 and 31 st December 1992 were invited to take part in the study.The initial number of pregnancies enrolled was 14,541, resulting in 13,988 children who were alive at 1 year of age.
When the oldest children were approximately 7 years of age, an attempt was made to bolster the initial sample with eligible cases who had failed to join the study originally.The total sample size for analyses using any data collected after the age of 7 is therefore 15,447 pregnancies and, of these, 14,901 children were alive at 1 year of age.There were 14,203 unique mothers initially enrolled in study but as a result of the additional phases of recruitment, 14,833 unique mothers enrolled in ALSPAC as of September 2021.
Study data gathered from participants at 22 years and onwards were collected and managed using Research Electronic Data Capture (REDCap) tools hosted at the University of Bristol 18 .REDCap is a secure, web-based software platform designed to support data capture for research studies.
Further details on ALSPAC have been published elsewhere [19][20][21] .Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/our-data/).Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees (ALEC; IRB00003312).Detailed information can be found on the study website http://www.bristol.ac.uk/alspac/researchers/research-ethics/.Implicit consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time.This data note focuses on two samples: G0 (mothers) and G1 (daughters) who have been asked about several menstrual cycle features at multiple timepoints.The G0 sample comprises mothers who were asked to report on their menstrual cycle features from the index pregnancy to menopause.Figure 1 shows the number of G0 participants responding to questions about periods at each relevant timepoint, and whether they had experienced a recent period.The female offspring (G1) sample comprises female participants who provided information, either reported by their mother or themselves, about menstrual cycle features from age 8 to age 24. Figure 2 shows the number of respondents who had started their period up until age 21.

Mothers (G0) Data
Participants answered questions relating to menstrual cycle length, absence of periods, regularity, heaviness, pain, and PMS-related symptoms across 17 questionnaires from the index pregnancy (reporting on menstruation prior to pregnancy) up to 28 years following birth of the index child (mean age 28.6 to 57.7 years).Participants also provided data on menstrual cycle features at four clinic assessments from mean age 47.4 to 52.6 years (Table 1).Table 1 provides an overview of the ALSPAC questionnaires and clinics, participants mean age, and the relevant menstrual cycle feature variables at each timepoint, as well as contraception variables at each available timepoint.

Cycle length
In the first three questionnaires asking about cycle length (mean age 28.6, 37.8, and 40.3 years), participants were asked how many days there were from the start of one period to the start of the next one if their periods were regular.At the mean age 49.7 years questionnaire, participants were asked if their periods were regular and could select yes, alongside multiple options for the length of their cycle, or no.Similarly, at the aged 51.3 years questionnaire, participants were asked how many days there usually were between the start of one period and the start of the next and were provided with multiple categorical options, as well as an option to report that their periods were too irregular to estimate.These questions and variables are summarised in Table 2.

Amenorrhea
At the mean age 37.8-and 40.3-year questionnaires, participants who reported not having periods were asked whether this was because of pregnancy, hysterectomy, menopause, or another reason.At the aged 47.4 questionnaire, participants were asked whether they had a period in the last 12 months.Participants were also asked whether they had a period in the last 12 months at the mean aged 51.3 and 57.7 year questionnaires and all of the clinics, as well as additionally being asked whether they had a period in the last 3 months.Those who reported not having a period were subsequently asked to report the reason for this.Responses included surgery,    3 provides a summary of these questions and responses.Moreover, as mentioned above, participants were asked to report the number of days from the start of one period to the start of the next in three questionnaires (mean age 28.6, 37.8, and 40.3 years).These variables may also be informative regarding amenorrhea if participants have reported long cycle lengths (e.g., 84 days or 3 months between two periods).

Cycle regularity
Participants were asked whether their periods were regular or not in three of the questionnaires (mean age 28.6, 37.8, and 40.3 years) and were also asked whether their periods were very, moderately, mildly, or not at all irregular in eight of the questionnaires (mean age 32.0, 33.Cell counts below 5 have been replaced with <5, and this may include 0.
their cycle was not regular or too irregular to estimate their cycle length.Table 4 gives an overview of these variables.
Heavy or prolonged bleeding G0 participants were asked about heavy bleeding and prolonged bleeding separately.Participants were asked whether their periods were very, moderately, mildly, or not at all heavy in eight questionnaires (mean age 32.0, 33.  5.

Menstrual pain
Participants were asked whether their periods were very, moderately, mildly, or not at all painful in eight questionnaires (mean age 32.0, 33.

Cycle length
In the nine puberty questionnaires, as well as in the 12.8 year clinic, participants (or their mothers up to 13.1 years) were asked to report the length of their menstrual cycle.In two later questionnaires (19.6 and 21 years), participants were asked about their cycle length but were given multiple categorical options instead of reporting the exact number of days.At the age 24 clinic, participants were asked to provide the exact number of days of their usual menstrual cycle or, if they were unsure, they could select an approximate length from multiple categorical options.Table 9 provides an overview of these questions and variables.

Amenorrhea
In the 13.1, 13.8, and 16.5 year child-based questionnaires, mothers were asked whether there had been months when their daughter's periods had not happened at all (if they had started regular periods) and subsequently, if yes, whether she had a period in the last three months.Similarly, at age 21, participants were asked if they had a period in the last three months and, if not, why their periods had stopped.Options included surgery, chemotherapy or radiation therapy, pregnancy or breastfeeding, no obvious reason or menopause, contraception, or that their periods had not started yet.Table 10 summarises these questions.Also, as discussed above, participants reported the length of their menstrual cycle in the nine puberty questionnaires and the 12.8 and 24 year clinic.These variables could provide further insight into amenorrhea where participants have reported long menstrual cycles (e.g., 84 days or 3 months between two periods).

Cycle regularity
At the first three research clinics (11.7, 12.8, and 13.8 years), participants reported whether their periods were regular or not (or they didn't know).At two later clinics (17.8 and 24 years) and two child-completed questionnaires (19.6 and 21 years), respondents were asked about the regularity or length of their cycle and, alongside a series of different cycle length options, were able to report that their cycle was not regular or too irregular for them to estimate their cycle length.In addition, in the mean age 21 questionnaire, participants were asked whether their periods were very, moderately, mildly, or not at all irregular.Table 11 provides an overview of these variables and summarises the responses of those who had started their periods.

Heavy or prolonged bleeding
In the nine puberty questionnaires, participants (or their mothers up to 13.1 years) were asked whether they had experienced heavy or prolonged bleeding with their period.Following this, participants who reported heavy or prolonged bleeding were asked whether they had contacted a doctor for this.
Participants were also asked to report the number of days bleeding they normally experienced.If participants were unsure about the exact number of days, they could instead select one of three options: 3 days or less, 4-6 days, or 7 days or more.In the age 21 questionnaire, participants were asked whether their periods were very, moderately, mildly, or not at all heavy.Table 12 provides a summary of these variables and the responses amongst G1 participants who had started their periods.

Menstrual pain
In the puberty questionnaires, except for 15.5 years, participants (or their mothers up to 13.1 years) were asked whether or not they had experienced severe cramps with their periods.At age 15.5, they were instead asked whether they had experienced pain with their periods, and then, for those who answered yes, whether the pain was severe, moderate, or mild.All puberty questionnaires also asked participants who reported severe cramps or pain with their periods whether they had contacted a doctor for this.In the age 21 questionnaire, participants were asked whether their periods were very, moderately, mildly, or not at all painful.Table 13 provides a summary of these variables.

Premenstrual symptoms
At 21 years only (Table 14), participants were asked whether they experienced 'particular problems' in the days before or during their periods.Those who responded yes were subsequently asked which problems they experienced and were provided with multiple symptoms: very fatigued, irritable, depressed, anxious, or other.Respondents indicated whether they experienced each of these symptoms before their periods, during their periods, or not at all (participants were able to select more than one option).
Using the menstrual cycle feature data Whilst few of the menstrual cycle feature variables are consistent across all available timepoints, it is possible to derive comparable variables for most features and timepoints.However, there are some important limitations that need to be considered when deriving such variables and planning analyses.

Cycle length
For both G0 and G1, participants reported the exact number of days of their menstrual cycle at multiple timepoints and this data could be used continuously or could be used to create categorical variables.For the continuous data, it is important to be aware of outliers in the data, with a small number of G0 participants reporting cycle lengths as long as 90 days and G1 participants reporting cycle lengths up to 150 days.It is challenging to know whether these responses reflect genuine long cycle lengths, for example reflecting an episode of amenorrhea, or if they are due to data entry errors or participants misunderstanding these questions.Approaches to managing these potential erroneous outliers will depend on the research questions being addressed and could include keeping all participants in the analyses and then repeating analyses with those whose cycle lengths are notably different to most participants (e.g. 4 standard deviations away from In the past year, how many days were there usually between your periods?   the mean), or some other threshold to see if the outlying values influence results, and reclassifying some participants at each reporting timepoint as having amenorrhea if they report a cycle length of 84 days or more (cessation of menstruation for three months).A further issue, however, is that of small peaks in the distribution around 5 days at some of the timepoints (Figure 3), suggesting that some individuals are reporting the number of days bleeding rather than cycle length.This needs to be managed when deriving variables for analyses.One possible method is to use multiple imputation to impute responses less than a certain threshold (e.g., 10), with responses to the same question at other timepoints contributing to the imputation model.Similar to managing outliers, there will be multiple methods that could be adopted to handle these possible misinterpretations, and a sensitivity analysis to compare results with different approaches is advised.Moreover, the wording of the questions varied at different timepoints for the G1 sample, which may have impacted how participants responded.For example, some questions specified that the length of a usual menstrual cycle referred to the interval from the first day of period to the first day of next period whereas others did not provide this clarification.It is possible therefore that some participants may have reported the number of days between the end of the first period and the start of the next period at some timepoints, possibly impacting the accuracy of reporting and how appropriate it is to compare responses between timepoints.Researchers should consider how to manage this limitation when designing their analyses.
Researchers may want to derive categorical variables to describe infrequent, frequent, and normal cycles (Table 2 and Table 9).Normal cycles are considered to range from 24-38 days, whereas cycles shorter than 24 days are considered frequent and those longer than 38 days are considered infrequent 10 .Such variables could be derived from the continuous variables (noting the issues above).Unfortunately, although ALSPAC also collected categorical data at some timepoints, the original variable categories do not always align with the clinical definitions of infrequent/frequent/normal.Moreover, the boundaries vary between the data collection timepoints.At some G0 and G1 timepoints, the categories are 23 days or less, 24-35 days, and more than every 35 days, whereas at others, the categories are less than 21 days, 21-25 days, 26-31 days, 32-39 days, 40-50 days, and more than 50 days.Finally, at the age 24 timepoint for G1, the categories are less than 25 days, 25-34 days, 35-60 days, and more than 60 days.Therefore, it may not be plausible to include all the available timepoints in the desired analyses, depending on its nature and the timepoints of interest.

Amenorrhea
Binary variables can be derived for amenorrhea for both G0 and G1 (no period in at least the last 3 months vs. had a period    **Heavy bleeding is defined as reporting 'heavy or prolonged bleeding', more than 8 days bleeding, or 'very'/'moderately' heavy periods, whereas not heavy is defined as reporting 'no heavy or prolonged bleeding', bleeding for 8 days or less, or periods that are 'mildly'/'not at all' heavy. in the last 3 months) at each timepoint (Table 3 and Table 10).However, most respondents who reported not having a period in the last 3 months provided a reason (e.g., surgery, chemotherapy, pregnancy, menopause, contraception, or not started their periods yet).It is also possible that this may be the case at timepoints where participants were not asked to provide a reason.Information on contraception use and other possible explanations is available from other data collection waves but not necessarily from same wave in which information on amenorrhea was collected (see further considerations below).This could be utilised to attempt to infer whether a participant who has reported not having a period in the last 3 months was also on contraception that could have explained this.
The cycle length variables whereby participants reported the exact number of days of their cycles could also be utilised.To be consistent with the above variables, the cycle length variables could be used to derive binary variables reflecting cycles of 84 days or more (i.e., no period in the last 3 months) compared with cycles of less than 84 days.This would be consistent with clinical guidelines which characterise amenorrhea, specifically secondary amenorrhea, as the cessation of menstruation for three months 13,22 .However, as discussed above, it is possible that the high values reported by participants in response to questions about their cycle length could be the result of data entry errors and therefore researchers should be cautious including these variables.Cell counts below 5 have been replaced with <5, and this may include 0. *Painful periods are defined as reporting 'severe cramps', 'severe/moderate pain with your period', or 'very/moderately' painful periods, whereas not painful is defined as reporting 'no severe cramps', 'mild/no pain with your period', or periods that are 'mildly/not at all' painful.*Participants are classified as experiencing the PMS-related symptom (very fatigued, irritable, depressed, anxious, or other) if they reported the symptom either before or during their period, whereas they are classified as not experiencing the symptom if they did not report the symptom before and during their period.

Cycle regularity
For both G0 and G1, the cycle regularity data could be used to derive a binary variable at each timepoint (regular vs. irregular) whereby a participant is classified as having irregular periods if they reported that their period was not regular, too irregular to estimate their cycle length, or very/moderately irregular (Table 4 and Table 11).Alternatively, the more granular 4-level variables for G0 (very, moderately, mildly, or not at all irregular) could be maintained as they are available at half of the timepoints (mean age 32.0 33.1, 34.0, 35.1, 38.3, 41.4, 48.6, and 51.3 years) if these are the only timepoints being considered in the analysis.

Heavy or prolonged bleeding
For the G0 sample, granular 4-level heavy bleeding variables (very, moderately, mildly, or not at all) could be utilised as these are available at each timepoint or binary variables could be derived (very or moderately heavy vs. mildly or not at all heavy).Also, at each of these timepoints, participants reported the duration of bleeding.Binary variables for prolonged bleeding, which is defined as more than 8 days bleeding 10 , could therefore also be derived (>8 days bleeding vs. 8 days bleeding or less) (Table 5).
For G1 however, it is not possible to separate heavy and prolonged bleeding in line with clinical guidelines due to the questions that were asked 10 .Instead, binary variables for heavy or prolonged bleeding could be derived (heavy or prolonged bleeding vs. neither heavy nor prolonged bleeding), whereby those who reported heavy or prolonged bleeding, very or moderately heavy bleeding, or bleeding for more than 8 days are classified as having heavy or prolonged bleeding (Table 12).Unfortunately, the G1 categorical response variables for days bleeding (3 days or less, 4-6 days, or 7 days or more) do not reflect the clinical guidelines regarding prolonged bleeding so are more challenging to incorporate into this binary definition.
It would also be possible to derive the same binary variables for the G0 sample if it was more appropriate for the specific research question to have comparable variables across the two generations (Table 5).
A key consideration with regards to the number of days bleeding variables is that there are some outliers.For example, at the G0 mean age 41.4 year questionnaire and the aged 14.6 and 17 G1 puberty questionnaire, participants reported up to 60 days bleeding.These outliers could be due to data entry errors, participants misunderstanding the question as relating to intervals between periods, or they could be genuine responses reflecting very long periods of bleeding.It is not possible with the available data to distinguish the reasons for such high values and therefore there is no clear way to handle such outliers.Approaches could include recoding such values (e.g., 4 SDs from the mean or the 99th percentile) as missing, imputing them based on available data at other timepoints, or replacing them with the highest non-outlier value ('top-coding').The most appropriate method will depend on the nature of the analyses being conducted; however, it may be beneficial to conduct a sensitivity analysis to compare the results with different approaches to handling the outliers.

Menstrual pain
It is possible to derive binary pain variables for both G0 and G1 (Table 6 and Table 13).Most of the G1 pain-related variables are already in a binary format except for the age 15 and 21 ones.These variables could be dichotomised whereby those reporting severe or moderate pain are categorised as having pain associated with their periods.The G0 variables could be dichotomised whereby those reporting very/moderately painful periods could be categorised as having pain associated with their periods, or these could be utilised as 4-level variables if appropriate for the analysis to maintain granularity.However, some caution is needed regarding the G1 variables as the differences in the wording of the questions may mean that participants responded differently to the binary puberty questions.The majority of the puberty questions ask about 'severe cramps' whereas the 15.5 puberty question asks about 'pain with your period' and, subsequently, whether these are severe, moderate, or mild.These are slightly different concepts, and this does appear to be reflected in the responses as the preceding and following timepoint have 48.9% and 56.3% of people reporting severe cramps respectively, but only 16.4% of all respondents at 15.5 report severe pain and 61.7% report severe or moderate pain.

Premenstrual symptoms
There are multiple ways in which the PMS-related variables could be used depending on the nature of analysis.Firstly, a binary variable could be derived which reflects experiencing 'particular problems', 'problems with their period', 'pre-menstrual tension', or any of the listed problems related to periods (irritable, anxious, depressed, very fatigued, other) vs no problems.Secondly, binary variables could be derived for each of these symptoms separately (e.g., irritable vs not irritable).These variables could be derived separately or together for before and during the period.This would be useful for examining the timing of specific symptoms and for distinguishing symptoms occurring during a period from those occurring before, in line with clinical definitions of PMS 23,24 .Finally, it would also be possible to derive a variable reflecting the number of PMS-related symptoms an individual experiences, either before their periods only (in line with clinical definitions of PMS) or during a period (Table 7 and Table 14).Whilst PMS is not defined by a specific number of symptoms, this variable, which would range from 0 (no PMS symptoms) to 5 (all listed PMS symptoms), could be useful to examine the severity of PMS as more symptoms may reflect a greater negative impact on daily functioning 23,24 .
A key consideration for the PMS-related data is that PMS encompasses a wide range of physical and psychological symptoms.There are additional common PMS-related symptoms, such as headaches, acne, and breast tenderness 23,24 , which are not reported in the ALSPAC data.Whilst there is an option for participants to report other problems with their period, we cannot be sure how participants have interpreted this, nor that people have reported all other relevant symptoms.
A further issue with the G0 data is that some of the earlier timepoints only ask about 'particular problems' but are not followed up with questions about which symptoms are experienced and their timing.A broad range of problems could be labelled by participants as 'particular problems', not all of which would necessarily be considered PMS.Therefore, there may be some misclassification.Misclassification may also arise due to the retrospective nature of such assessments, which have been suggested to result in an overreporting of symptoms compared with prospective diary reporting 25 .Finally, the G1 data is somewhat limited because it is only available at one timepoint.

Further considerations
Contraception is a primary factor that must be considered when utilising the menstrual cycle feature data outlined above.Hormonal contraception is particularly prevalent and, although participants on hormonal contraception will not experience natural menstrual cycles, some will experience withdrawal bleeds that they may classify as menstrual periods (and respond to questions about their menstrual cycle features accordingly).In addition, hormonal contraception is often prescribed in response to menstrual cycle issues such as heavy menstrual bleeding, irregular cycles, or pain.Therefore, as menstrual cycle features and contraception are bidirectionally associated, it may be inappropriate to adjust for or exclude those on hormonal contraception depending on the analyses being conducted.For example, research examining whether BMI influences heavy menstrual bleeding would not want to exclude individuals on hormonal contraception as BMI can impact the likelihood of using hormonal contraception.
Excluding would therefore result in collider bias and potentially result in a spurious association between BMI and heavy menstrual bleeding.There are other methods that may account for hormonal contraception when conducting analyses such as this, possibly including multiple imputation, meta-regression, and probability weighting, and there is data available on contraception in ALSPAC to enable this.Table 15 and Table 16 provide a summary of the available contraceptive variables for G0 and G1 respectively.
ALSPAC has collected data on multiple other reproductive factors that can influence menstrual cycle features, and researchers should consider these when planning analyses.Pregnancies and breastfeeding will result in absences of menstruation and can result in more problematic periods upon their initial resumption.Also, menopause and surgeries such as hysterectomies and oophorectomies will stop menstruation and therefore researchers will need to consider how to account for this.Age at menarche is another important factor assessed in ALSPAC which some researchers may want to consider depending on their research question.Whilst ALSPAC participants have been asked about recent pregnancies, breastfeeding, menopause, surgeries and hormonal contraceptive use, the questions are not necessarily asked at the same timepoint as the menstrual cycle features and therefore including such variables may require making inferences backwards or forwards in time.
Moreover, many of the menstrual cycle questions ask participants about their most recent period.Whilst most participants are likely to be answering such questions regarding a period they had up to one month ago, others may be answering about periods much further back in time prior to going on hormonal contraception, becoming pregnant, having a hysterectomy or oophorectomy or going through menopause.This further highlights the importance of considering other reproductive factors where possible to ensure only participants who are experiencing a menstrual cycle at the time of reporting are included in analyses.This is also important to consider as recall bias may become an issue for those who are answering questions about menstrual cycles that were experienced many months or years ago.
There is some limited data available on health conditions that might cause the problematic menstrual cycle features summarised in this data note.For example, participants were asked whether they had ever been diagnosed with PCOS at age 22 for G1 and mean age 49.7 for G0.The same question was asked about endometriosis at age 22 for G1 only.These data provide an opportunity to identify individuals whose problematic menstrual cycle features at previous timepoints are a result of one of these underlying disorders.However, as these conditions tend to be both underdiagnosed and take a long time to be diagnosed, the data is unlikely to capture all participants whose menstrual cycle features are due to either PCOS or endometriosis.There are many other factors which ALSPAC has detailed data on and, depending on the research question, researchers may wish to consider alongside menstrual features.
Missing data needs to be considered when using menstrual symptom data in ALSPAC.Missing data, for example due to loss to follow up, or participants not responding to some questions, could lead to selection bias if missingness is related to the exposure and outcome being explored 26 .Missing menstrual symptom data could be due to a variety of reasons, including withdrawing from ALSPAC, loss to follow-up, not wanting to answer some questions, pregnancy, breastfeeding, menopause, contraception, and surgeries.It is plausible that several of these (e.g., withdrawal, loss to follow-up, pregnancy, breastfeeding) are socially patterned and hence likely to relate to the exposures and outcomes that are being related to the menstrual cycle features and therefore selection bias is possible.How this is explored and dealt with will depend on the specific research question being addressed and the pattern and extent of missing data.There are several papers that can help with exploring this, including ones that have been used previously in ALSPAC studies 27,28 .
One of the primary benefits of the ALSPAC dataset is the repeated measures of menstrual cycle features at multiple timepoints.This may allow researchers to increase their sample size by maintaining participants who have missing data at one timepoint by combining with data from other similar timepoints with multiple imputation.Beyond this, repeated measures can enable trajectory modelling to explore the causes and consequences of different patterns of menstrual cycle features over time if appropriate for the research question.The longitudinal nature of the data enables it to be utilised to assess   However, researchers should also consider the possibility of misclassification.Whilst random measurement error is always a possibility, researchers should consider possible sources of systematic measurement error.For example, particular groups may feel more uncomfortable answering questions regarding their menstrual cycle and therefore provide answers that do not reflect their menstrual cycle features, or certain groups may be more or less likely to notice or report their menstrual cycle features.The subjective nature of many of the questions may also contribute to possible misclassification as participants may have different perspectives as to what constitutes certain symptoms, such as "heavy bleeding" or "severe cramps".However, subjective experience of these features is crucial and is considered in the clinical guidelines for diagnosis of HMB 10 .Moreover, menstrual cycle features may vary randomly over time, contributing to random error and the possibility of regression dilution when using repeated measures 29 .Researchers therefore need to be aware of both random and systematic measurement error and address this as much as possible within their analysis.
The data note helpfully outlines what data has been collected over what period of time and by what methods.It is particularly helpful to researchers in outlining some of the difficulties and pitfalls when using this data.These aspects of the data appear to have been carefully thought through in this report.The authors also provide information on how to apply to use this data for research.
I have some minor comments for improving the paper.In the abstract, it would be helpful to understand the ages of the mothers and daughters.Currently it says that the data are collected at 21 and 20 timepoints but does not describe the ages of the participants.Secondly, in the description of the G1 cohort, for example in Figure 2, it is not clear if the ages described are the exact age of the participants or are the average ages.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Women's health Epidemiology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
research.I have some minor comments for improving the paper.In the abstract, it would be helpful to understand the ages of the mothers and daughters.Currently it says that the data are collected at 21 and 20 timepoints but does not describe the ages of the participants.
We have added information on the ages of the mothers and daughters across the timepoints in the abstract.
Secondly, in the description of the G1 cohort, for example in Figure 2, it is not clear if the ages described are the exact age of the participants or are the average ages.
We have changed the X axis label in both Figure 1 and Figure 2 to clarify that these are the average (mean) ages of the G0 or G1 sample at each data collection timepoint.

Introduction
Very well written.However, please note the following edits: Citation missing for the first line.
○ Citation missing for line ending with "idiopathic".

○
The introduction could be enriched by looking at the differences in research and reporting of problematic menstrual features between LMICs and HICs.I think this should be included to provide context.Also, there something to be said about perceptions -perhaps there is underreporting because there's a lack of understanding or education around what is considered "problematic"

Methods
REDCap (Research Electronic Data Capture) should appear first before being abbreviated.Also is there a citation available for this?

○
For figure 2 -how was "started period" defined?Would one day of spotting suffice?Was three sequential months of bleeding?Girls often have irregular bleeding at menarche so it would be better to understand what "started period" actually meant to those reporting.

○
The question around heavy bleeding is interesting as this is very subjective.Curious as to how this was interpreted by participants… Were they considering "heavy" in relation to their personal experiences or in comparison to their perception of what is "normal"?
○ Similar comments for pain -how is a participant interpreting this question?It may have been better to look at the effect of pain on the ability to conduct usual daily activities.
○ I commend the use of repeated measures.

○
Further considerations should also explore the effect of medications/vaccinations on the menstrual cycle.There is a growing body of literature that suggests these could inform the menstrual cycle.Reviewer Expertise: menstrual health I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
The question around heavy bleeding is interesting as this is very subjective.Curious as to how this was interpreted by participants… Were they considering "heavy" in relation to their personal experiences or in comparison to their perception of what is "normal"?
We have added a few sentences to the final paragraph of the Further Considerations explaining the subjective nature of such questions and, whilst it may pose some limitations, it is also crucial to measure and understand people's experiences of such symptoms.Subjective reporting of heavy bleeding is also used for diagnosis of heavy menstrual bleeding.
Similar comments for pain -how is a participant interpreting this question?It may have been better to look at the effect of pain on the ability to conduct usual daily activities.
See comment above regarding heavy bleeding -we have added a few sentences to address this.Whilst a useful way of measuring and understanding menstrual pain, these measures are not available from the ALSPAC cohort.
I commend the use of repeated measures.Further considerations should also explore the effect of medications/vaccinations on the menstrual cycle.There is a growing body of literature that suggests these could inform the menstrual cycle.
We have added a sentence in the fourth paragraph of the Further Considerations highlighting that there are many factors we have not mentioned that are available in ALSPAC which researchers may want to consider, depending on their specific research question.We have not provided specific mention of medications/vaccinations as we do not aim to provide an exhaustive list of all such factors in this section.

Figure 1 .
Figure 1.Number of G0 participants (mothers) with recent periods.'No recent period' includes participants who reported no period in the last 12 months, no 'recent' periods, or no 'periods nowadays'.

Figure 2 .
Figure 2. Number of G1 participants (daughters) who have started periods.'Started period' includes participants who responded yes when asked whether they had started their period yet or not at each timepoint.

Table 4 .
Original and derived G0 cycle regularity variables (N before your last period, how many days do you usually have between the start of one period and the start of the next period? are regular, how long on average would you say your cycle is?(i.e.number of days between each period e.g. 30, 28) daughter) has started her regular periods, have there been any months when the period didn't happen at all?What is the length of your usual menstrual cycle (the interval from first day of period to first day of next period), when you are NOT using oral contraception, -based questionnaire completed by mothers.Cell counts below 5 have been replaced with <5, and this may include 0. *Continuous responses recoded.

Figure 3 .
Figure 3. Histograms showing G0 and G1 cycle length variables.3A is from the G0 37.8 year questionnaire and 3B is from the G0 40.3 year questionnaire.3C is from the G1 13.1 year puberty questionnaire and 3D is from the G1 age 24 clinic assessment.
was born / toddler was 8 months old / child was 18 months old / in the past year / child's 5 th birthday / in the last 2 years, how often have you used the following:

○
Is the rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?PartlyAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.

Have you ever had a dilatation and curettage (D&C / scrape)? If yes, was this because of: heavy periods?
*Heavy bleeding is defined as reporting 'very' or 'moderately' heavy periods, whereas not heavy is defined as reporting periods that are 'mildly' or 'not at all' heavy.

Please indicate if you have used any medicines in the last 12 months: painful periods
*Painful periods are defined as reporting 'very' or 'moderately' painful periods, whereas not painful is defined as reporting periods that are 'mildly' or 'not at all' painful.

Table 7 . Original and derived G0 premenstrual symptoms variables
Do you generally find in the days before or during your periods you have particular problems?Do you generally find in the days before or during your periods you have particular problems?furtherquestionsregardingtheirmenstrual cycle features at two later 'child-completed' questionnaires (age 19.6 and 21 years) and six clinic assessments (age11.7,12.8,13.8,15.5, 17.8, and 24 years).Table8provides an overview of the relevant ALSPAC variables at each timepoint according to menstrual cycle features, as well as contraception variables at each available timepoint.
(N (%)).In the past year, have you had problems with your period?*Participants are classified as experiencing the PMS-related symptom (very fatigued, irritable, depressed, anxious, or other) if they reported the symptom either before or during their period, whereas they are classified as not experiencing the symptom if they did not report the symptom before and during their period.answered

Table 9 . Original and derived G1 cycle length variables (N (%)). Question 8.1 years 9.6 years 10.6 years 11.6 years 12.8 years clinic 13.1 years 14.6 years 15.5 years 16 years 17 years 19.6 years 21 years 24 years clinic Original variables In the past year, what was the usual length of your daughter's menstrual cycle?
If your periods are regular, how long on average would you say your cycle is?(i.e.number of days between each period e.g. 30, 28)

the past year, how many days of bleeding have you usually had during each period?
*Continuous responses recoded.

method of contraception (if any) are you or your sexual partner currently using? Cross true or false for each option
causal relationships between menstrual cycle features and possible causes and consequences, reducing the likelihood of reverse causality.